This study aims at understanding trends in social factors that high school students regularly deal with as it relates to their academic performance. As a youth ministry volunteer, I care deeply about what high school students deal with during their daily life. These social factors have a heavily influence on their lives, and often affect how they do in school. While this is not a complete set of variables originally from the data set, I believe they are the most relevant to my objectives. In this data set, we have 649 total surveys filled out by high school students from two different high schools. This data has been extensively cleaned, so not much more needed to be done. Most of the variables are answered with a categorical scale from 1 to 5 (1 being the lowest/worst and 5 being the highest/best). Some variables in this set such as grades are quantitative variables, and some are binary like participation in extracurricular activities.
My final project attempts to answer to following research questions:
Do high school students involved in extracurricular activities do better academically?
Does alcohol consumption have an effect on academic performance?
Is there a correlation between family relationships and academic performance?
Alcohol Consumption - Most student do not consume alcohol on the weekdays, but 198 of the students did in some amount. The startling number is that 34 of them say that they consume heavy amounts of alcohol during the week and 134 consume excessive amounts on the weekend. As we see trends related to alcohol, keeping in mind the amount of students that are being affected is essential.
Extracurriculars - There are 383 females who answered the survey, whereas only 266 males filled it out. It will be helpful to keep this in mind when comparing all of these variables. The amount of students who participate in out-of-school activities is well split, with a slightly less females participating than not and slightly more males participating than not.
Family Relations - Many students reported having bad or okay relationships with their family, totaling 152. This is significant on a humane level, as high school students should have help with these factors as it could lead to bad academic performance as we might see in the data trends. Most students reported having parents that lived together, while only 80 have parents living separated.
Study Time - There is a significant positive relationship between increased study time by students on their academic performance. Students who chose to study over 5 hours per week saw an average 2 point increase in grades in both semesters. A slight decrease in academic achievement in Semester 2 and a plateau in Semester 1 is seen for those who studied over 10 hours, leading to the finding that there is a sweet spot for hours studied per week. More studying means better grades!
Extracurriculars - There is a clear correlation in both semester grades that participation in extracurricular activities positively impacts performance in class. This could be do to the structure that extracurriculars or sports adds into students’ lives that creates a better rhythm for school work. In conclusion, participation in activities outside of the classroom results in better overall academic performance.
First, there seems to be a positive relationship between family relationship score and the overall grades achieved by students in both semesters. Specifically as seen in the graph comparing semester 2 grades to family scores, there is a significant increase in grades for those who answered 4 or 5 on the family score. This signifies that there is a relationship between these two variables, as family score increases so does academic performance. Additionally, an odd conclusion can be found in the graph comparing study time to family score. The highest number of hours studied can be found at 1, 4, and 5 on the family scale. This seems to point to a relationship between very bad or very good family relationships driving students to study and possibly earn better grades (analysis from previous tab). Lastly, cohabitation seems to have little effect on student grades, with a slightly higher median going to students with parents living apart. This might be due to less structured family time in the lives of students, giving more time for study.
According to the information presented, alcohol consumption has a negative relationship to the academic performance of high school students. Consumption on both weekdays and weekends bring the overall mean of grades downward. Weekday consumption especially signifies a trend downward in both semester grades, even dipping below an average grade of 10 in both cases. Semester two grades in all data seems to have a more concise line, as students are finding the structure of their year compared to semester 1. The presence of outliers in this data I chose to keep in. Likely, the student either failed or dropped out of the class early, which could not be related to alcohol consumption. Yet, the cause of this is unknown so it could be do to social factors such as the one being tested here.
---
title: "Alcohol on Academics"
output:
flexdashboard::flex_dashboard:
theme:
version: 4
bootswatch: journal
navbar-bg: blue
orientation: columns
vertical_layout: fill
source_code: embed
---
```{r setup, include=FALSE}
library(flexdashboard)
pacman::p_load(tidyverse, plotly, dplyr, DT)
totaldata<- read_csv("./student-lpor.csv")
thedata<- select(totaldata, sex, Pstatus, studytime, activities, famrel, Dalc, Walc, G1, G2)
attach(thedata)
```
Introduction
===
Column {.tabset data-width=600}
---
<style>
h1 {text-align: center;}
</style>
<h1>
<font size="4">
***An Analysis of Alcohol Consumption and Social Factors on Academic Success***
</font>
</h1>
### Introduction to Study
This study aims at understanding trends in social factors that high school students regularly deal with as it relates to their academic performance. As a youth ministry volunteer, I care deeply about what high school students deal with during their daily life. These social factors have a heavily influence on their lives, and often affect how they do in school. While this is not a complete set of variables originally from the data set, I believe they are the most relevant to my objectives. In this data set, we have 649 total surveys filled out by high school students from two different high schools. This data has been extensively cleaned, so not much more needed to be done. Most of the variables are answered with a categorical scale from 1 to 5 (1 being the lowest/worst and 5 being the highest/best). Some variables in this set such as grades are quantitative variables, and some are binary like participation in extracurricular activities.
My final project attempts to answer to following research questions:
- Do high school students involved in extracurricular activities do better academically?
- Does alcohol consumption have an effect on academic performance?
- Is there a correlation between family relationships and academic performance?
### Total Data Table
```{r}
DT::datatable(thedata)
```
Column {data-width=400}
---
### Variables
- **activities**: Involvement in extracurricular activities (Binary: yes or no)
- **studytime**: Number of hours studying per week (Categorical: <2 hours, 2 to 5 hours, 5 to 10 hours or >10 hours)
- **famrel**: Family relationship score (Categorical: 1-5)
- **Pstatus**: Parents live apart or together (Binary: A or P)
- **sex**: Sex of student
- **Dalc**: Alcohol consumption on weekdays (Categorical: 1-5)
- **Walc**: Alcohol consumption on weekends (Categorical: 1-5)
- **G1**: Semester 1 grade (0-20)
- **G2**: Semester 2 grade (0-20)
Key Var. Summary
===
Column {.tabset data-wdith=550}
---
### Alcohol on Weekdays
```{r}
Dalc_count<- count(thedata, Dalc)
Dalc_count$percent<- round(Dalc_count$n/sum(Dalc_count$n)*100, 2)
pie1<- ggplot(Dalc_count, aes(x="", y=percent, fill = Dalc))+
geom_bar(stat = "identity", width = 1, color = "white")
pie1<- pie1 + coord_polar("y", start=0) +
geom_text(aes(label = paste0(percent, "%")), fontface = "bold", color = "black", position = position_stack(vjust = 0.5))
pie1<- pie1 + theme_void()+ guides(fill=guide_legend(title="Alcohol on Weekdays")) + ggtitle("Percentages of Students Who Consumed Alcohol on Weekdays") +
theme(text = element_text(size = 10))
pie1
```
### Alcohol on Weekends
```{r Pie 1}
Walc_count<- count(thedata, Walc)
Walc_count$percent<- round(Walc_count$n/sum(Walc_count$n)*100, 2)
pie2<- ggplot(Walc_count, aes(x="", y=percent, fill = Walc))+
geom_bar(stat = "identity", width = 1, color = "white")
pie2<- pie2 + coord_polar("y", start=0) +
geom_text(aes(label = paste0(percent, "%")), fontface = "bold", color = "black", position = position_stack(vjust = 0.5))
pie2<- pie2 + theme_void()+ guides(fill=guide_legend(title="Alcohol on Weekends")) + ggtitle("Percentages of Students Who Consumed Alcohol on Weekends") +
theme(text = element_text(size = 10))
pie2
```
### Extracurriculars
```{r Bar1}
thedata %>%
ggplot(aes(x=activities, fill = sex))+
geom_bar()+
labs(x="Participates in Extracurriculars", y="Frequency", title ="Distribution of Students in Activities") +
guides(fill=guide_legend(title="Sex"))+
scale_fill_discrete(labels = c('Female', 'Male'))
```
### Family Relationships
```{r Bar2}
thedata %>%
ggplot(aes(x=famrel, fill = Pstatus))+
geom_bar()+
labs(x="Family Relationship Score", y="Frequency", title ="Distribution of Family Scores with Parent Living Arrangement")+
guides(fill=guide_legend(title="Cohabitation"))+
scale_fill_discrete(labels = c('Apart', 'Together'))
```
Column{data-width=450}
---
### Key Variable Summaries
**Alcohol Consumption** - Most student do not consume alcohol on the weekdays, but 198 of the students did in some amount. The startling number is that 34 of them say that they consume heavy amounts of alcohol during the week and 134 consume excessive amounts on the weekend. As we see trends related to alcohol, keeping in mind the amount of students that are being affected is essential.
**Extracurriculars** - There are 383 females who answered the survey, whereas only 266 males filled it out. It will be helpful to keep this in mind when comparing all of these variables. The amount of students who participate in out-of-school activities is well split, with a slightly less females participating than not and slightly more males participating than not.
**Family Relations** - Many students reported having bad or okay relationships with their family, totaling 152. This is significant on a humane level, as high school students should have help with these factors as it could lead to bad academic performance as we might see in the data trends. Most students reported having parents that lived together, while only 80 have parents living separated.
Activities/Studytime
===
Column {.tabset data-width=550}
---
### Study vs. S1
```{r}
thedata%>%
ggplot(aes(x=studytime, y= G1))+
geom_point(col= "darkgreen")+
geom_smooth(col="darkred")+
labs(x="Study Time in Hours/Week", y="Semester 1 Grade", title = "Distribution of Semester 1 Grades Over Study Time") +
scale_x_continuous(labels = c("<2", "2-5", "5-10", ">10"))
```
### Study vs. S2
```{r}
thedata%>%
ggplot(aes(x=studytime, y= G2))+
geom_point(col= "darkblue")+
geom_smooth(col="orange")+
labs(x="Study Time in Hours/Week", y="Semester 2 Grade", title = "Distribution of Semester 2 Grades Over Study Time") +
scale_x_continuous(labels = c("<2", "2-5", "5-10", ">10"))
```
### Activities vs. S1
```{r}
thedata %>% ggplot(aes(x = factor(activities), y = G1)) +
geom_boxplot(outlier.color ="red", outlier.size = 3, fill = "lightblue") +
labs(x= "Participates in Extracurriculars", y= "Semester 1 Grade", title = "Distribution of Semester 1 Grades Over Activities")
```
### Activities vs. S2
```{r}
thedata %>% ggplot(aes(x = factor(activities), y = G2)) +
geom_boxplot(outlier.color = "purple4", outlier.size = 3, fill = "lightgreen") +
labs(x= "Participates in Extracurriculars", y= "Semester 2 Grade", title = "Distribution of Semester 2 Grades Over Activities")
```
Column {data-width=450}
---
### Analysis and Conclusions
**Study Time** - There is a significant positive relationship between increased study time by students on their academic performance. Students who chose to study over 5 hours per week saw an average 2 point increase in grades in both semesters. A slight decrease in academic achievement in Semester 2 and a plateau in Semester 1 is seen for those who studied over 10 hours, leading to the finding that there is a sweet spot for hours studied per week. More studying means better grades!
**Extracurriculars** - There is a clear correlation in both semester grades that participation in extracurricular activities positively impacts performance in class. This could be do to the structure that extracurriculars or sports adds into students' lives that creates a better rhythm for school work. In conclusion, participation in activities outside of the classroom results in better overall academic performance.
Family Factors
===
Column {.tabset data-width=550}
---
### Family Score vs. S1
```{r}
thedata%>%
ggplot(aes(x=famrel, y= G1))+
geom_point(col= "sienna3")+
geom_smooth(col="black")+
labs(x="Family Relations Score", y="Semester 1 Grade", title = "Distribution of Semester 1 Grades Over Family Relations Score")
```
### Family Score vs. S2
```{r}
thedata%>%
ggplot(aes(x=famrel, y= G2))+
geom_point(col= "maroon4")+
geom_smooth(col="black")+
labs(x="Family Relations Score", y="Semester 2 Grade", title = "Distribution of Semester 2 Grades Over Family Relations Score")
```
### Family Score vs. Study
```{r}
thedata%>%
ggplot(aes(x=famrel, y= studytime))+
geom_point(col= "darkorange")+
geom_smooth(col="black")+
labs(x="Family Relations Score", y="Study Time in Hours/Week", title = "Distribution of Study Time Over Family Relations Score")
```
### Cohabitation vs. S1
```{r}
thedata %>% ggplot(aes(x = factor(Pstatus), y = G1)) +
geom_boxplot(outlier.color = "goldenrod2", outlier.size = 3, fill = "darkslategray") +
labs(x= "Parent Cohabitation Status", y= "Semester 1 Grade", title = "Distribution of Semester 1 Grades Over Parent Living Status") +
scale_x_discrete(label = c("Apart", "Together"))
```
Column {data-width=450}
---
### Analysis and Conclusions
First, there seems to be a positive relationship between family relationship score and the overall grades achieved by students in both semesters. Specifically as seen in the graph comparing semester 2 grades to family scores, there is a significant increase in grades for those who answered 4 or 5 on the family score. This signifies that there is a relationship between these two variables, as family score increases so does academic performance. Additionally, an odd conclusion can be found in the graph comparing study time to family score. The highest number of hours studied can be found at 1, 4, and 5 on the family scale. This seems to point to a relationship between very bad or very good family relationships driving students to study and possibly earn better grades (analysis from previous tab). Lastly, cohabitation seems to have little effect on student grades, with a slightly higher median going to students with parents living apart. This might be due to less structured family time in the lives of students, giving more time for study.
Alcohol Consumption
===
Column {.tabset data-width=550}
---
### Weekday Alc. vs. S1
```{r}
thedata %>%
ggplot(aes(x=Dalc, y=G1)) +
geom_point(color = "blue") +
geom_smooth(color = "orange") +
labs(x= "Alcohol Consumption on Weekdays", y="First Semester Grade") + ggtitle("Semester 1 Grades Over Alcohol on Weekdays")
```
### Weekday Alc. vs. S2
```{r}
thedata %>%
ggplot(aes(x=Dalc, y=G2)) +
geom_point(color = "blue") +
geom_smooth(color = "orange") +
labs(x= "Alcohol Consumption on Weekdays", y="Second Semester Grade", title = "Distribution of Second Semester Grades Over Alcohol on Weekdays")
```
### Weekend Alc. vs. S1
```{r}
thedata %>%
ggplot(aes(x=Walc, y=G1)) +
geom_point(color = "blue") +
geom_smooth(color = "orange") +
labs(x= "Alcohol Consumption on Weekends", y="First Semester Grade", title = "Distribution of First Semester Grades Over Alcohol on Weekends")
```
### Weekend Alc. vs. S2
```{r}
thedata %>%
ggplot(aes(x=Walc, y=G2)) +
geom_point(color = "blue") +
geom_smooth(color = "orange") +
labs(x= "Alcohol Consumption on Weekends", y="Second Semester Grade", title = "Distribution of Second Semester Grades Over Alcohol on Weekends")
```
Column {data-width=450}
---
### Analysis and Conclusions
According to the information presented, alcohol consumption has a negative relationship to the academic performance of high school students. Consumption on both weekdays and weekends bring the overall mean of grades downward. Weekday consumption especially signifies a trend downward in both semester grades, even dipping below an average grade of 10 in both cases. Semester two grades in all data seems to have a more concise line, as students are finding the structure of their year compared to semester 1. The presence of outliers in this data I chose to keep in. Likely, the student either failed or dropped out of the class early, which could not be related to alcohol consumption. Yet, the cause of this is unknown so it could be do to social factors such as the one being tested here.
Author
===
Column {.tabset data-width=600}
---
### About the Author
My name is Andrew Jones and I am a second-year undergraduate student at the University of Dayton. I am currently pursuing a Bachelor of Science degree in Mathematics with a minor in Data Analytics. I plan on graduating in May 2026.
I am looking forward to a career in the data analytics field. I will gain significant experience with the Data Analytics Co-op position I will be filling this summer 2025 at Crown Equipment located in New Bremen, Ohio.
### References and Limitations
**Limitations**:
Unfortunately, I could not find data that applied to schools within the United States that could answer my research questions. I had to settle on secondary school data from Portugal. In the future, I hope that American school data can be used for these purposes. Additionally, the time and information restraint that I had limited my exploration of this data. There are still many comparisons and analyses that can be made with these variables that I could not address. Lastly, the categorical values given based on a range is quite ambiguous in this study. For instance, we do not know how much alcohol students consumed, but rather the self-given score how much they thought they drank. This is subjective in some regards, so a more definitive and quantifiable measure of some variables would increase certainty of comparisons and trends in the data for the future.
**References**:
"High School Alcoholism and Academic Performance" from Gabriel One -
https://www.kaggle.com/datasets/gabrielluizone/high-school-alcoholism-and-academic-performance?select=student-lpor.csv
Column {data-width=400}
---
### Picture of Me
```{r Picture, fig.width=5, echo= FALSE, fig.height= 5}
knitr::include_graphics("IMG_8315(Edited).jpg")
```